第九章 Web应用程序开发及网络爬虫
WEB CS, ZJU 2018 12
Overview
Overview 9.1 Web 9.2 Web Dash 9.3 Web 9.4 2 Python
9.1 Web应用程序开发概述
9.1 Web Web 访 Python 3
Web应用程序
Web Web Web 访 Web / (Browser/Server, B/S 广 https://etf50.pythonanywhere.com 4 Python
[例9-1]极简Web应用程序
[ 9-1] Web Python 5
Web应用程序的运行过程
Web 1 Python 2. URL 访 3.Flask URL 4. URL   5. 6. Python 6
URL统一资源定位符
URL URL (Uniform Resource Locator) protocol :// hostname[:port] / path / [;parameters][?query]#fragment URL (1)protocol 使 https (2)hostname[:port] ( ) Flask 5000 www.baidu.com ; (3)path Python 7
Web应用程序开发两大部分
Web --- HTML+CSS --- Web Python+Flask Dash Python 8
超级文本标记语言(HTML)
HTML Python 9 HTML 使 HTML .htm html 使 TXT
[例9-2]网页实例---html代码
[ 9-2] ---html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width, initial-scale=1.0"> <title> </title> </head> <body> <h1> </h1> <!-- --> <hr> <br> <br> <br> <img src="pic1.PNG" width="350px" height="250"><img src="pic2.PNG" width="350px" height="250"> <ul> <li><a href="http://product.china-pub.com/4878243"> </a></li> <li><a href="http://product.china-pub.com/4639796"> </a></li> <li><a href="http://product.china-pub.com/6876612"> · </a></li> </ul> </body> </html> Python 10
HTML常用标签---表示网页内容
HTML --- HTML HTML HEAD TITLE BODY background bgcolor H1 H2 H3 I EM B STRONG PRE P SPAN DIV BR A href IMG src width height TABLE width border TR TH TD / OL UL / LI DL INPUT name SELECT name Python 11
层叠样式表(CSS)---网页的样式
(CSS)--- CSS HTML CSS CLASS CLASS 使 CLASS CLASS CLASS . ID CLASS ID 使 ID ID ID ID #” Python 12
[例9-3]有CSS样式的网页
[ 9-3] CSS from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return '''<!DOCTYPE HTML> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title> ID </title> <style type="text/css"> .stress{font-weight:bold;} .bigsize{font-size:25px;} #stressid{font-weight:bold;} #bigsizeid{font-size:25px;} </style> </head> <body> <center><h1> </h1></center> &nbsp;&nbsp;&nbsp;&nbsp; Corona Virus Disease 2019 COVID-19 2019 2019 <br> &nbsp;&nbsp;&nbsp;&nbsp;<span id="stressid">2020 2 11 </span> COVID-19” 2 21 <span class="bigsize"> </span> COVID-19” <span class="stress">2020 8 18 </span> <span id="bigsizeid"> </span> </body></html> ''' if __name__ == "__main__": app.run() Python 13
常用选择器例子
Python 14 .class .stress class=stress #id #stressid id=stressid * * element span span element element div p div p element,element div,p div p element>element div>p div p
HTML DOM树
HTML DOM   15 Data and Computation
HTML,CSS详细介绍
HTML,CSS MOOC ----- Web 1-5 https://www.icourse163.org/course/LYKJXY- 1207005804 MOOC -----Web https://www.icourse163.org/course/BFU-1003382003 Python 16
9.2 Web应用框架Dash
9.2 Web Dash Python 17 Dash Flask Plotly React Python 使 App 使 Python Dash=Flask+Plotly+React Dash Web URL Dash Web Dash pip install dash pip install dash-html-components pip install dash-core-components pip install dash-table Dash https://dash.plotly.com/ Dash App: https://dash-gallery.plotly.host/Portal/
类似[例9-1]程序的Dash模块实现
[ 9-1] Dash import dash import dash_html_components as html app = dash.Dash(__name__) app.layout = html.Div( children=[html.H1(‘Hello Python!’)] style={‘textAlign’: ‘center’} # ) if __name__ == '__main__': app.run_server() dash app app = dash.Dash(__name__) app app.layout = html.Div children=[…] children style={} CSS Python 18
Dash和HTML对比
Dash HTML app = dash.Dash(__name__) index= html.Div( [html.H1(‘Hello Python!’)] style={'textAlign': 'center’} ) app.layout=index <html> <head> <title></title> </head> <body> <div style="text-align:center" > <h1>“Hello Python ”</h1> </div> </body> </html> Python 19
[例9-4] Dash模块的使用
[ 9-4] Dash 使 import dash import dash_core_components as dcc import dash_html_components as html import plotly.express as px import pandas as pd external_stylesheets = [‘bWLwgP.css’] # ‘bWLwgP.css app = dash.Dash(__name__, external_stylesheets=external_stylesheets) # app data=pd.read_csv("covid_19.csv") fig=px.scatter_geo(data, locations="Country",locationmode="country names", color_discrete_sequence=px.colors.qualitative.Light24, color="Country",size="Cumulative_deaths", hover_data=["Cumulative_cases"],hover_name="Country" ) fig.update_layout(height=450) app.layout = html.Div([ #app html.H1(children=' ',style={'textAlign': 'center'}), dcc.Markdown(children=""" Corona Virus Disease 2019 COVID-19 2019 2019 """), dcc.Graph(figure=fig), dcc.Markdown(children=" [ ](https://covid19.who.int/info)", style={'textAlign': 'center'}) ]) if __name__ == '__main__': app.run_server() Python 20
dash-html-components模块
dash-html-components Python 使 dash-html-components HTML >>> import dash_html_components as html >>> dir(html) ['A', 'Abbr', 'Acronym', 'Address', 'Area', 'Article', 'Aside', 'Audio', 'B', 'Base', 'Basefont', 'Bdi', 'Bdo', 'Big', 'Blink', 'Blockquote', 'Br', 'Button', 'Canvas', 'Caption', 'Center', 'Cite', 'Code', 'Col', 'Colgroup', 'Command', 'Content', 'Data', 'Datalist', 'Dd', 'Del', 'Details', 'Dfn', 'Dialog', 'Div', 'Dl', 'Dt', 'Element', 'Em', 'Embed', 'Fieldset', 'Figcaption', 'Figure', 'Font', 'Footer', 'Form', 'Frame', 'Frameset', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'Header', 'Hgroup', 'Hr', 'I', 'Iframe', 'Img', 'Ins', 'Isindex', 'Kbd', 'Keygen', 'Label', 'Legend', 'Li', 'Link', 'Listing', 'Main', 'MapEl', 'Mark', 'Marquee', 'Meta', 'Meter', 'Multicol', 'Nav', 'Nextid', 'Nobr', 'Noscript', 'ObjectEl', 'Ol', 'Optgroup', 'Option', 'Output', 'P', 'Param', 'Picture', 'Plaintext', 'Pre', 'Progress', 'Q', 'Rb', 'Rp', 'Rt', 'Rtc', 'Ruby', 'S', 'Samp', 'Script', 'Section', 'Select', 'Shadow', 'Slot', 'Small', 'Source', 'Spacer', 'Span', 'Strike', 'Strong', 'Sub', 'Summary', 'Sup', 'Table', 'Tbody', 'Td', 'Template', 'Textarea', 'Tfoot', 'Th', 'Thead', 'Time', 'Title', 'Tr', 'Track', 'U', 'Ul', 'Var', 'Video', 'Wbr', 'Xmp', '_', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_basepath', '_component', '_css_dist', '_current_path', '_dash', '_filepath', '_imports_', '_js_dist', '_os', '_sys', '_this_module', 'f', 'json', 'package', 'package_name'] Python 21
Python模块和HTML的转换
Python HTML import dash_html_components as html html.Div([ html.H1('Hello Dash'), html.Div([ html.P('Dash converts Python classes into HTML'), html.P("This conversion happens behind the scenes by Dash's JavaScript front-end") ]) ]) HTML html.Div(…) <div>…</div>, html.H1(…) <h1>…</h1> <div> <h1>Hello Dash</h1> <div> <p>Dash converts Python classes into HTML</p> <p>This conversion happens behind the scenes by Dash's JavaScript front-end</p> </div> </div> Python 22
HTML标签的属性和CSS样式
HTML CSS style HTML CSS html.Div([ html.Div('Example Div', style={'color': 'blue', 'fontSize': 14}), html.P('Example P', className ='my-class', id='my-p-element') ], style={'marginBottom': 50, 'marginTop': 25}) Dash HTML <div style="margin-bottom: 50px; margin-top: 25px;"> <div style="color: blue; font-size: 14px"> Example Div </div> <p class ="my-class", id="my-p-element"> Example P </p> </div> Python 23
用Markdown标记表示网页
Markdown Markdown 便 import dash_core_components as dcc dcc.Markdown(''' #### Dash and Markdown Dash supports [Markdown](http://commonmark.org/help). Markdown is a simple way to write and format text. It includes a syntax for things like **bold text** and *italics*, [links](http://commonmark.org/help), inline `code` snippets, lists, quotes, and more. ''') Python 24
Dash应用程序结构
Dash Dash Dash Web Python , Javascript Dash Dash Python 25
Div布局(layout)---程序界面设计
Div (layout)--- html.Div html.Div html.Div height,weight html.Div children dash_html_components HTML dash_core_components JavaScript HTML CSS import dash_core_components as dcc dcc.Graph plotly Python 26
[例9-5]用dash_bootstrap_components模块布局
[ 9-5] dash_bootstrap_components import dash import dash_core_components as dcc import dash_html_components as html import dash_bootstrap_components as dbc import plotly.express as px import pandas as pd #bootstrap.min.css bootstrap CSS assets external_stylesheets = ['bootstrap.min.css'] app = dash.Dash(__name__, external_stylesheets=external_stylesheets) #covid19.jpg windows / pic = '/assets/covid19.jpg' data=pd.read_csv("covid_19.csv") #covid_19.csv fig=px.scatter_geo(data,locations="Country",locationmode="country names", color_discrete_sequence=px.colors.qualitative.Light24, color="Country",size="Cumulative_deaths", hover_data=["Cumulative_cases"],hover_name="Country") secondline=dbc.Container([dbc.Row(children=[dbc.Col([html.Br(), html.H4(“”“ 2019 2019 """), html.Img(src=pic,height="130px",style={'marginTop':15}),],md=3), dbc.Col(html.Div(dcc.Graph(figure=fig)),md=9) ] ) ]) app.layout = html.Div(children=[html.H1(children=' ',style={"textAlign":"center",'marginTop':15}), # secondline,# dcc.Markdown(children=" [ ](https://covid19.who.int/info)", # style={'textAlign': 'center','fontSize':20}), ]) if __name__ == '__main__': app.run_server() Python 27
dash_bootstrap_components样式
dash_bootstrap_components 'bootstrap.min.css' 12 960px / CSS Python 28
三行两列布局
app.layout = html.Div(children=[html.H1(children=' ',style={"textAlign":"center",'marginTop':15}), # secondline,# dcc.Markdown(children=" [ ](https://covid19.who.int/info)", # style={'textAlign': 'center','fontSize':20}), ]) secondline=dbc.Container([dbc.Row(children=[dbc.Col([html.Br(), html.H4(“”“ 2019 2019 """), html.Img(src=pic,height="130px",style={'marginTop':15}),],md=3), dbc.Col(html.Div(dcc.Graph(figure=fig)),md=9) ] ) ]) 12 md=3 3 Python 29
每个区块是—个盒子
Python 30
盒子模型:https://www.runoob.com/css/css-boxmodel.html
https://www.runoob.com/css/css-boxmodel.html Python 31
网页上的输入输出
dash.dependencies ‘Input’ ‘Output’ @app.callback Python 32
[例9-6]输入两个数求和
[ 9-6] import dash import dash_core_components as dcc import dash_html_components as html from dash.dependencies import Input, Output external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'] app = dash.Dash(__name__, external_stylesheets=external_stylesheets) app.layout = html.Div([ html.H2(" ",style={"textAlign":"center"}), html.Div([" : ", dcc.Input(id='my-input1', value='0', type='text')]), html.Div([" : ", dcc.Input(id='my-input2', value='0', type='text')]), html.Br(), html.Div(id='my-output'), ]) @app.callback(Output('my-output', component_property='children'), Input('my-input1', component_property='value'), Input('my-input2', component_property='value')) def update_output_div(value1,value2): return ' : {}'.format(int(value1)+int(value2)) if __name__ == '__main__': app.run_server() Python 33
用Plotly产生的输入可以继续使用
Plotly 使 Python 34
[例9-7]程序
[ 9-7] import dash import dash_core_components as dcc import dash_html_components as html import pandas as pd import plotly.express as px dataset=pd.read_excel("gapminder.xlsx") fig = px.scatter(dataset, x="income", y="life_exp", animation_frame="year", animation_group="country",size="income", color="continent", hover_name="country",log_x=True, size_max=45, range_x=[500,200000], range_y=[25,90], labels=dict(income=“ (PPP )",life_exp=" 寿 ")) app = dash.Dash(__name__) app.layout = html.Div([ html.H1( # children=' 寿 ', style={'textAlign': 'center','color': "#111111" } # ), html.Div([ # #Column 1 html.Div([html.Br([]), html.Br([]), # html.H6(' 使 '), html.P('Pandas-- '), html.P('Dash--Web ')], className = "two columns"), #Column 2 html.Div([dcc.Graph(figure=fig)], className = "ten columns"), ],className = "row ")]) app.run_server() Python 35
CSS,JS,IMG等文件
CSS,JS,IMG assets Dashexternal.css assets className = “two columns“ Python 36
导航栏
import dash_bootstrap_components as dbc dbc.dbc.Nal NavbarSimple dbc.NavbarSimple(brand="Python --EFT50",color="dark", # dark=True, # brand_href="https://baike.baidu.com/item/%E4%B8%8A%E8%AF%8150ETF/6453037 ? fr=aladdin",sticky="top") dbc.Nav([dbc.NavLink(html.H3(" 线 "), href="/", active="exact"), dbc.NavLink(html.H3(“K 线 ),href="/candle",active="exact"), ],vertical="md", #vertical="md" Python 37
[例9-8]导航栏作为输入,多页切换(1)
[ 9-8] 1 import dash import dash_core_components as dcc import dash_html_components as html import dash_bootstrap_components as dbc from draw import line,candle from dash.dependencies import Input, Output external_stylesheets = ['bootstrap.min.css','bWLwgP.css'] # 使 CSS app = dash.Dash(__name__, external_stylesheets=external_stylesheets) first = dbc.NavbarSimple(brand="Python --EFT50",color="dark", # dark=True, # brand_href="https://baike.baidu.com/item/%E4%B8%8A%E8%AF%8150ETF/6453037?fr=aladdin" ,sticky="top") second=html.H1(children='ETF50 线 (2018.1.1-2018.6.30)',style={'textAlign':'center','color':"black"}) third=dbc.Container([dbc.Row([dbc.Col([html.Br([]), html.Br([]), # dbc.Nav([dbc.NavLink(html.H3(" 线 "), href="/", active="exact"), dbc.NavLink(html.H3(“K 线 ),href="/candle",active="exact"), ],vertical="md", #vertical="md" ), html.H1(' 使 :'), html.H3('Pandas-- '), html.H3('Plotly-- '), html.H3('Dash--Web ') ],md=3,style={'textAlign':'right'}), dbc.Col(dcc.Graph(id='page-content'),md=9)], ) ]) Python 38
导航栏作为输入,多页切换(2)
2 @app.callback(Output("page-content", "figure"), [Input("url", "pathname")]) def render_page_content(pathname): if pathname == "/": return line() else: return candle() app.layout = html.Div([dcc.Location(id="url"),first, second,html.Br([]), html.Br([]), third,html.Br([]), html.Br([])], style={"background-color":"E0E0E0"}) # app.run_server() Python 39
Flask和Dash集成
Flask Dash import flask import dash import dash_core_components as dcc import dash_html_components as html import pandas as pd import plotly.express as px external_stylesheets= ['bootstrap.min.css'] dataset=pd.read_excel("gapminder.xlsx") fig = px.scatter(dataset, x="income", y="life_exp", animation_frame="year", animation_group="country",size="income", color="continent", hover_name="country",log_x=True, size_max=45, range_x=[500,200000], range_y=[25,90], labels=dict(income=" (PPP )",life_exp=" 寿 ")) server = flask.Flask(__name__) @server.route('/') def index(): return 'Hello Flask ' app = dash.Dash(__name__,server=server,routes_pathname _prefix='/dash/’, external_stylesheets=external_stylesheets ) app.layout = html.Div([ html.H1( # children=' 寿 ', style={'textAlign': 'center','color': "#111111" } # ), html.Div([ # #Column 1 html.Div([html.Br([]), html.Br([]), # html.H6(' 使 '), html.P('Pandas-- '), html.P('Dash--Web ')], md=2), #Column 2 html.Div([dcc.Graph(figure=fig)], md=10), ],md=2)]) if __name__ == '__main__': server.run() Python 40
9.3 Pythonanywhere网站部署Web应用程序
9.3 Pythonanywhere Web Pythonanywhere www.pythonanywhere.com Pricing& signup, Beginner account 2 login etf50 Python 41
用Flask创建最简App程序
Flask App etf50.pythonanywhere.com, Hello from Flask! Python 42
上传程序到服务器
New directory Upload a file assets assets bootstrap.min.css bWLwgP.css linux Python 43
Linux常用命令
Linux linux / . .. ~ home linux Python 44 ls cd pwd mkdir rmdir cp cat
[例9-9]修改flask.app程序
[ 9-9] flask.app draw.py pd.read_csv("/home/etf50/mysite/etf50.csv") [ 9-9] Python 45
用Reload按钮启动Web应用程序
Reload Web Python 46
云服务器Pythonanywhere上dash模块安装
Pythonanywhere dash Consoles, Bash, linux $ pip3.7 install --user dash $ pip3.7 install --user dash-html-components $ pip3.7 install --user dash-core-components 使 Python 3.6 $ pip3.6 install --user dash $ pip3.6 install --user dash-html-components $ pip3.6 install --user dash-core-components Python 47
9.4 网络爬虫
9.4 Python 48 Python HTTP requests requests requests-html requests-html requests-html Python 3.6
Requests-html模块功能
Requests-html Full JavaScript support ! CSS Selectors  (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors , for the faint at heart. Mocked user-agent (like a real web browser). Automatic following of redirects. Connection–pooling and cookie persistence. The Requests experience you know and love, with magical parsing abilities. Async Support Python 49
模块的方法
>>> import requests_html >>> dir(requests_html) ['AsyncHTMLSession', 'BaseParser', 'BaseSession', 'Cleaner', 'DEFAULT_ENCODING', 'DEFAULT_NEXT_SYMBOL', 'DEFAULT_URL', 'DEFAULT_USER_AGENT', 'Element', 'HTML', 'HTMLResponse', 'HTMLSession ', 'HtmlElement', 'List', 'MaxRetries', 'MutableMapping', 'Optional', 'PyQuery', 'Result', 'Set', 'ThreadPoolExecutor', 'TimeoutError', 'Union', 'UserAgent', '_Attrs', '_BaseHTML', '_Containing', '_DefaultEncoding', '_Encoding', '_Find', '_ HTML ', '_LXML', '_Links', '_Next', '_NextSymbol', '_RawHTML', '_Result', '_Search', '_Text', '_URL', '_UserAgent', '_XPath', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_get_first_or_list', 'asyncio', 'cleaner', 'etree', 'findall', 'html_to_unicode', 'lxml', 'lxml_html_tostring', 'parse_search', 'partial', 'pyppeteer', 'requests', 'soup_parse', 'sys', 'urljoin', 'urlparse', 'urlunparse', 'user_agent', 'useragent'] Python 50
r.html子模块
r.html r.html requests_html.HTML requests_html html requests_html html 使 >>> dir(r.html) ['__aiter__', '__anext__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_async_render', '_encoding', '_html', '_lxml', '_make_absolute', '_pq', 'absolute_links', 'add_next_symbol', 'arender', 'base_url', 'default_encoding', 'element', 'encoding', 'find' , 'full_text', 'html', 'links', 'lxml ', 'next', 'next_symbol', 'page', 'pq', 'raw_html', 'render ', 'search', 'search_all', 'session', 'skip_anchors', ' text ', 'url', ' xpath '] Python 51
r.Html子模块常用功能
r.Html r.html.links r.html. absolute_links r.html.find(…… CSS HTML r.html.path(……) XPATH HTML e= r.html.find(…… e.text e.attrs e.html e.search(……) Python 52
[例9-10]获取网页链接
[ 9-10] Python 53 from requests_html import HTMLSession session = HTMLSession() # r = session.get('https://docs.python.org/3.7') # #r.html.absolute_links for url in list(r.html.absolute_links)[:10]: print(url)
[例9-11] 分析HTML文档
[ 9-11] HTML from requests_html import HTML doc = """<div class="sidebar-extra"><p> web 使 Python </p> <ul> <li>Plotly -- </li><li>Pandas -- </li> <li>Tushare -- </li><li>Flask -- Web </li> </ul> </div>""" html = HTML(html=doc) elements=html.find('div.sidebar-extra‘) print(elements) print("----------------------") for element in elements: print(element.text) d=element.attrs # print(d['class‘]) print("----------------------") eps=html.find('div.sidebar-extra>p‘) print(eps) for ep in eps: print(ep.text) print("----------------------") eus=html.find('div.sidebar-extra>ul') print(eus) for eu in eus: print(eu.text) Python 54
获取网页静态内容
2017-2018 “https://www.dxsbb.com/news/7566.html” chrome Python 55
[例9-12]获取排名程序
[ 9-12] Python 56
程序运行结果
Python 57
获取json格式的地理数据:'https://geo.datav.aliyun.com/areas_v2/bound/100000_full.json
json 'https://geo.datav.aliyun.com/areas_v2/bound/100000_full.json {     "type": "FeatureCollection",     "name": "100000_full",     "features": [         {             "type": "Feature",             "properties": {                 "adcode": "110000",                 "name": " ",                  "center": [                     116.405285,                     39.904989                              },                       {             "type": "Feature",             "properties": {                 "adcode": "120000",                 "name": " ",                  "center": [                     117.190182,                     39.125596                 ],            } Python 58
网页是Json格式数据
Json from requests_html import HTMLSession import jsonpath import json session = HTMLSession() url='https://geo.datav.aliyun.com/areas_v2/bound/100000_full.json' r = session.get(url) html_str=r.html.html # json python jsonobj = json.loads(html_str) # name,center citylist = jsonpath.jsonpath(jsonobj,'$..name') centerlist=jsonpath.jsonpath(jsonobj,'$..center') datalist=zip(citylist[1:],centerlist) for city in datalist: print(city) (' ', [116.405285, 39.904989]) (' ', [117.190182, 39.125596]) (' ', [114.502461, 38.045474]) (' 西 ', [112.549248, 37.857014]) (' ', [111.670801, 40.818311]) (' ', [123.429096, 41.796767]) (' ', [125.3245, 43.886841]) (' ', [126.642464, 45.756967]) (' ', [121.472644, 31.231706]) (' ', [118.767413, 32.041544]) (' ', [120.153576, 30.287459]) (' ', [117.283042, 31.86119]) (' ', [119.306239, 26.075302]) (' 西 ', [115.892151, 28.676493]) (' ', [117.000923, 36.675807]) (' ', [113.665412, 34.757975]) (' ', [114.298572, 30.584355]) (' ', [112.982279, 28.19409]) (' 广 ', [113.280637, 23.125178]) (' 广 西 ', [108.320004, 22.82402]) (' ', [110.33119, 20.031971]) (' ', [106.504962, 29.533155]) (' ', [104.065735, 30.659462]) (' ', [106.713478, 26.578343]) (' ', [102.712251, 25.040609]) (' 西 ', [91.132212, 29.660361]) (' 西 ', [108.948024, 34.263161]) (' ', [103.823557, 36.058039]) (' ', [101.778916, 36.623178]) (' ', [106.278179, 38.46637]) (' ', [87.617733, 43.792818]) (' ', [121.509062, 25.044332]) (' ', [114.173355, 22.320048]) (' ', [113.54909, 22.198951]) Python 59
获取网页动态内容
https://ncov.dxy.cn/ncovh5/view /pneumonia?from=singlemessa ge&isappinstalled=0 javascript 使 requests_html render Python 60
查看网页元素
chrome Python 61
JavaScript渲染,执行javascript程序
JavaScript , javascript 使 JavaScript JS requests-html HTML render ~/.pyppeteer/ chromium JS 使 chromium #script javascript # " " document.getElementsByClassName("expandRow___1Y0WD") # submit[0].click() script = """() => { var submit=document.getElementsByClassName("expandRow___1Y0WD"); submit[0].click() }""" r.html. render (script=script) #JavaScript , javascript Python 62
用语句html_str=r.html.find(‘.areaBox___Sl7gp’)取数据
html_str=r.html.find(‘.areaBox___Sl7gp’) Python 63
字符串转html格式
html html html = HTML(html=html_str[0].html) provinces=html.find("div.areaBlock1___3qjL7") html_str[0].html >>> html_str[0].html <div class=“areaBlock1___3qjL7”><p class=“subBlock1___3cWXy”><img alt=“” src=“” class=“close___Hz1_J”/> </p><p class=“subBlock2___2BONl”>174</p><p class=“subBlock3___3dTLM”>11,524</p><p class=“subBlock4___3SAto”>205</p><p class=“subBlock5___33XVW”>11,145</p><p class=“subBlock6___3B_P6”><span class=“alink___38BGN”><span class=“content___2NBbQ”> </span><img alt=“icon” src=“” class=“icon___3tHFb”/></span></p></div> Python 64
分拆数据
for province in provinces[2:]: data=HTML(html=province.html) print(data.find("p.subBlock1___3cWXy")[0].text," ", data.find("p.subBlock2___2BONl")[0].text," ", data.find("p.subBlock3___3dTLM")[0].text," ", data.find("p.subBlock4___3SAto")[0].text," ", data.find("p.subBlock5___33XVW")[0].text) province >>> province.html '<div class="areaBlock1___3qjL7"><p class="subBlock1___3cWXy"><img alt="" src="" class="close___Hz1_J"/> </p><p class="subBlock2___2BONl">174</p><p class="subBlock3___3dTLM">11,524</p><p class="subBlock4___3SAto">205</p><p class="subBlock5___33XVW">11,145</p><p class="subBlock6___3B_P6"><span class="alink___38BGN"><span class="content___2NBbQ"> </span><img alt="icon" src="" class="icon___3tHFb"/></span></p></div>' Python 65
获取各省covid-19数据完整程序
covid-19 from requests_html import HTMLSession from requests_html import HTML session = HTMLSession() url='https://ncov.dxy.cn/ncovh5/view/pneumonia?from=singlemessage&isappinstalled=0' r = session.get(url) #script javascript # " " document.getElementsByClassName("expandRow___1Y0WD") # submit[0].click() script = """() => { var submit=document.getElementsByClassName("expandRow___1Y0WD"); submit[0].click() }""" r.html.render(script=script) #JavaScript , javascript html_str=r.html.find('.areaBox___Sl7gp') html = HTML(html=html_str[0].html) provinces=html.find("div.areaBlock1___3qjL7") print(" ") for province in provinces[2:]: data=HTML(html=province.html) print(data.find("p.subBlock1___3cWXy")[0].text," ",data.find("p.subBlock2___2BONl")[0].text," ", data.find("p.subBlock3___3dTLM")[0].text," ",data.find("p.subBlock4___3SAto")[0].text," ", data.find("p.subBlock5___33XVW")[0].text) Python 66